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A  New  Item  Response  Theory  Modeling  Approach 

Introduction.  Until  recently,  most  theoretical  and  applied  item  response  theory 
(IRT)  based  research  has  uncritically  assumed  one  of  a  small  set  of  unidimensional, 
locally  independent  monotone  parametric  models;  e.g.,  one-,  two-,  or  three-parameter 
logistic  and  normal  ogive  models  for  a  finite  item  test.  (See  Lord  [1980]  for  a 
survey  of  this  IRT  modeling  research  tradition  and  Mislevy  [1987]  for  a  survey  of 
current  IRT  modeling  research.) 

By  contrast,  this  paper  makes  a  determined  case  for  assuming  a  monotone 
nonparametric  (i.e.,  no  specific  functional  form  for  item  response  functions  assumed) 
infinite  item  pool  IRT  framework  with  local  independence  replaced  by  a  less 
restrictive  and,  we  claim,  psychometrically  more  appropriate  assumption,  namely 
essential  independence.  Essential  independence  provides  the  basis  for  assessing  the 
essential  dimensionality  of  test  data.  Essential  dimensionality,  much  in  the  spirit 
of  counting  the  number  of  dimensions  in  a  factor  analytic  model,  is  the  number  of 
major  latent  dimensions  with  minor  dimensions  ignored.  Essential  unidimensionalitv. 
the  existence  of  exactly  one  major  dimension,  then  provides  a  justification  for 
carrying  out  IRT  based  statistical  analyses  that  require  unidimensionality.  Ve  favor 
the  use  of  unidimensional  IRT  modeling  approaches  in  applications  when  they  are  used 
subsequent  to  a  careful  statistical  analysis  verifying  that  essential 
unidimensionality  fits  the  item  response  data  sufficiently  well.  On  the  other  hand, 
the  uncritical  use  of  the  standard  unidimensional  three  parameter  logistic  model  in 
applications  is  the  equivalent  of  Plato’s  cave  dweller’s  attempt  to  interpret  the 
outside  world  entirely  on  the  basis  of  shadows  cast  on  his  cave  wall  (see  Reckase, 
Carlson,  Ackerman,  and  Spray  [1986]  and  Vang  [1988]  for  implications  of  the 
uncritical  use  of  unidimensional  models  where  multidimensionality  holds). 

Other  nonparametric  approaches  appear  in  the  literature,  iokken  scaling,  with 
its  stress  on  the  Loevinger  homogeneity  index,  has  received  considerable  attention. 
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See  for  exanple  lokken  and  Levis  (1982)  and  Sijtsaa  and  lolenaar  (1987).  Cliff 
(1977,  1979)  proposes  a  non  latent  trait  approach  stressing  the  degree  of  consistency 
of  the  order  relationships  between  persons  and  between  items.  These  approaches  with 
their  eaphasis  on  test  scalability,  which  is  only  peripherally  related  to  essential 
unidiaensionality,  are  not  closely  related  to  the  nonparaaetric  approach  of  this 
paper. 

Suppose  that  one  uses  the  herein  proposed  infinite  item  pool  modeling  approach 
and  that  essential  unidiaensionality  is  assumed  —  hopefully  subsequent  to  a 
statistical  analysis  of  essential  dimensionality.  It  is  established  below  that  two 
major  consequences  follow:  (i)  the  uniqueness  of  the  latent  ability  in  an  ordinal 
scaling  sense  and  (ii)  the  consistency  of  estimation  of  the  unique  latent  ability. 
Thus,  latent  ability  be  consistently  estimated  in  the  essentially  independent, 
essentially  unidimensional  case,  even  if  the  usual  local  independence  does  not  hold. 

This  paper  continues  the  work  of  Stout  (1987),  where  essential  unidimensionality 
was  first  defined  and  a  statistical  test  of  essential  unidimensionality  proposed  and 
its  properties  and  performance  investigated.  Indeed,  it  is  important  to  emphasize 
that  for  psychological  test  data  the  nonparametric,  monotone,  essentially 
independent,  essentially  unidimensional  model  of  this  paper  can  be  tested  for  lack  of 
statistical  fit,  as  described  in  the  1987  paper. 

Our  paper  is  organized  as  follows:  Section  1  reviews  the  traditional  I&T 
modeling  approach.  Section  2  defines  essential  independence  and  essential 
dimensionality  and  presents  basic  properties.  Section  3  considers  the  consistent 
estimation  of  ability,  establishes  the  "uniqueness”  of  the  latent  trait,  and 
introduces  balanced  linear  empirical  scoring.  Section  4  considers  a  conceptual 
probabilistic  framework  for  the  generation  of  essentially  unidimensional  tests 
consisting  of  multidimensional  items.  Section  5  briefly  discusses  and  summarizes  the 
results  of  the  paper. 
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1.  The  Traditional  IRT  Modeling  Approach.  According  to  the  latent  trait  viewpoint,  each 
examinee  is  indexed  by  a  possibly  vector  valued  variable  with  many  examinees  permitted 
to  be  assigned  to  each  0.  Associated  with  each  item  i  is  an  item  response  function 
(IRF)  that  denotes  the  probability  that  a  randomly  chosen  examinee  from  the 

set  of  examinees  with  ability  6  will  get  the  item  right.  (Various  researchers 
perfer  various  interpretations  of  P£(i)  —  this  one  is  our  perference.  See  Hambleton 
and  Swaminathan,  1985,  pp.  26-27  for  discussion.)  Random  sampling  of  examinees  from 
a  specified  population  induces  a  distribution  F(f)  on  the  latent  trait  space  of  Os 
and  hence  a  distribution  for  the  test  response  Djj  =  (U^, •  •  •  ,Ujj)  of  a  randomly 
chosen  examinee.  The  random  test  response  vector  Djj  will  often  be  referred  to  as 
the  "test".  Similarly,  the  random  variable  will  often  be  referred  to  as  the  ith 
"item".  Observed  values  of  and  0^  will  be  denoted  by  Ujj  and  u^ 

respectively,  Ujj  =  (uj^, •  •  •  ,Ujj)  will  always  be  a  sequence  of  Os  and  Is.  =  1 

denotes  a  correct  response  and  0^  =  0  denotes  an  incorrect  response  to  item  i  for  a 

randomly  chosen  examinee.  The  latent  random  vector  is  denoted  by  0  and  particular 
values  taken  on  by  0  are  denoted  by  £.  Note  that  P^(^)  =P[P^=  =  ^  = 

^  ^or  all  i,  f.  For  notational  convenience,  let  P(ujj|^)  denote  the 
conditional  distribution  P[0j|j  =  Uj^l©  =  ^.  It  is  important  to  stress  that  a  "test" 
U|^  can  have  many  possible  latent  trait  models.  That  is,  there  are  many  choices  of 
the  pair  F(0),  P(]u^|^)  such  that,  for  all  Ujj, 

ni, = %]  =  r  •••  r  «(i)  (1) 

■'“OD  OD 

Note  here  that  (Uj^,  0)  are  a  pair  of  random  vectors  whose  joint  distribution  is 
specified  by  the  marginal  distribution  F(£)  and  the  conditional  distribution 

batent  models  for  will  be  denoted  by  (Djj,  0,  P(uj(|i)>  P(i))  or  for 
brevity  by  (Ujj,  0). 
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Three  characteristics  of  latent  models  are  of  considerable  importance: 

(i)  The  model  is  said  to  be  monotone  (1)  if  P[U.  =  =  1|S  =  £] 

1 

is  nondecreasing  in  0  for  each  subtest  (D^  j***,®^  )  of  Ujf  (here  by 

definition  0^  <  0^  if  and  only  if  <  ^2i  for  each  coordinate  i;  also 
1  <  ij  <  •  •  •  <  ijj  <  N) .  The  model  is  said  to  be  strictly  monotone  if 
''nondecreasing"  is  replaced  by  "strictly  increasing." 

(ii)  The  model  (]^)^  is  said  to  be  d  dimensional  if  6  is  a  d  dimen¬ 
sional  random  vector.  The  d  dimensional  ability  is  then  denoted  by 

The  dimensionality  of  0  will  be  denoted  by  dim(ej  or  d. 

(iii)  The  model  (0)1  is  said  to  be  locally  independent  (LI)  if 

N 

P(!igl^  =  P[0i  =  up-.-.Uj,  =  uj^ie  =  f]  =  ■fT^P[Ui  =  u.|e  =  ^  (2) 

N  u.  1-u. 

-  TT  fi(i)  ‘  [i-Pi(i)]  * 

1=1 

tt 

for  all  $  and  each  of  the  2  choices  of  (Up*>*,U|y). 

The  most  commonly  used  class  of  models  has  been  the  II,  I,  d  =  1  models.  In 
this  case  1  is  equivalent  to  the  I&Fs  all  being  monotone.  Usually  for  models  when 

1,  d  =  1  holds,  the  ItFs  are  typically  strictly  monotone.  Note  that  in  the  LI,  d=l 
case  with  the  latent  distribution  F(fi)  having  density  f(^)  that  (1)  and  (2)  combine 
to  produce  the  "usual"  liT  model  equation 

P[Sj  .  n,]  =  r  {  IT  [1  -  PiC*)]*””'  }  m  (3) 

2.  A  New  Conceptualization  of  Test  Dimensionality.  Let  us  recall  the  traditional 

liT  definition  of  test  dimensionality  that  almost  always  applies  in  IRT  models: 
Definition  2.1.  The  dimensionality  d  of  a  test  D^  is  the  minimal  dimen¬ 
sionality  required  for  0  to  produce  a  latent  model  (Uj^,  0)  that  is  both  LI 
and  I.  □ 

Although  mathematically  appealing,  this  definition  is  rather  impractical  for 
mental  testing  because,  in  actual  practice,  individual  test  items  clearly  have 
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multiple  determinants  of  their  respective  probabilities  of  correct  response.  This 
position  has  been  pursued  clearly  and  vigorously  by  Humphreys  (1984),  who  states: 

The  related  problems  of  dimensionality  and  bias  of  items  are  being  approached  in 
an  arbitrary  and  oversimplified  fashion..  It  should  be  obvious  that  unidimensionality 
can  only  be  approximated.  •••  The  large  amount  of  unique  variance  in  items  is  not 
random  error,  although  it  can  be  called  error  from  the  point  of  view  of  the  attribute 
that  one  is  attempting  to  measure.  •••  Ve  start  with  the  assumption  that  responses  to 
items  have  many  causes  or  determinants. 

Humphreys  (1984)  points  out  that  a  dominant  attribute  (i.e.,  dominant  dimension) 
results  from  an  attribute  overlapping  many  items  and  asserts  that  attributes  common 
to  relatively  few  items  or  even  unique  to  individual  items  are  unavoidable  and  indeed 
are  not  detrimental  to  the  measurement  of  a  dominant  dimension.  In  his  writings, 
Humphreys  stresses  that  the  low  item  intercorrelations  typically  observed  argue 
strongly  for  viewing  individual  items  as  determined  by  multiply  attributes.  Although 
the  existence  of  multiply  determined  items  is  rarely  emphasized  in  the  lET 
literature,  it  is  a  theme  with  a  long  history  in  the  factor  analytic  test  theory 
literature.  Classical  factor  analysis  applied  to  binary  test  data  of  course 
implicity  assumes  the  possibility  of  many  determinants,  allowing  for  many 
determinants  specific  to  individual  items  in  addition  to  one  or  more  dominant 
dimensions.  IcDonald  (1981)  actually  argues  for  the  existence  of  "minor  components" 
in  factor  analytic  modeling  of  test  data.  That  is,  he  argues  for  the  existence  of 
multiple  determinants,  many  of  which  are  common  to  relatively  few  items  at  most. 
Tucker,  Koopman,  and  linn  (1969)  have  developed  a  factor  analytic  test  simulation 
model  that  includes  "minor  factors"  as  well  as  dominant  factors  and  unique  factors. 

Unfortunately,  the  traditional  definition  (Definition  2.1),  based  on  local 
independence,  makes  no  distinction  between  dominant  and  minor  dimensions.  Thus,  if 
taken  seriously,  this  definition  compels  us  to  take  as  test  dimensionality  the  total 
number  of  all  item  dimensions  rather  than  adopting  the  more  appropriate  "factor 
analytic  viewpoint"  by  which  only  the  number  of  dominant  dimensions  is  counted.  This 
is  true  even  in  situations  with  only  one  dominant  dimension  where,  both  from  the 
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viewpoint  of  psychmetric  verity  and  of  modeling  parsimony,  it  would  be  desirable  to 
ignore  multiple  determinants  (i.e.,  minor  and  unique  factors)  and  categorize  tests  as 
unidimensional.  Thus  the  traditional  definition  requires  us  to  assign  dimensionality 
d  =  dQ  >  1  (djj  possibly  quite  large  in  fact)  in  settings  where  it  would  be 
desirable  to  assign  d  =  1.  For  example,  if  all  items  of  a  long  test  depend  on  0^, 
but  Items  1  and  2  alone  also  depend  on  then  d=  2. 

Clearly,  it  is  an  important  psychometric  goal  to  be  able  to  statistically  assess 
whether  or  not  a  test  ^  is  driven  by  exactly  one  dominant  dimension.  As  a 
necessary  precursor,  a  mathematical  conceptualization  of  the  number  of  dominant 
dimensions  is  needed,  namely  the  essential  dimensionality  of  a  collection  of  test 
items.  In  order  to  present  a  rigorous  definition  of  essential  dimensionality,  it  is 
necessary  to  conceptualize  as  the  initial  observed  segment  of  an  infinite  item 
pool  {Up  i  >  !}•  Here,  it  is  assumed  that  whatever  process  has  been  used  to 
construct  the  first  N  items  of  the  pool  making  up  the  test  Uj^  could  be  continued 
in  the  same  manner  by  including  further  items  from  the  pool.  Thus  the  infinite  item 
pool  {Uj^,  i  >  1}  is  of  the  same  dimensional  character  as  =  {Up  1  <  i  <  N}.  If 
an  actual  item  banking  scheme  with  random  sampling  of  items  is  being  used  to 
construct  the  test,  then  ^  and  {U^,  i  >  1}  being  the  same  dimensional  character  is 
guaranteed.  Such  random  sampling  of  items  is  often  used  for  criterion  referenced 
tests  constructed  from  item  banks;  see  for  example  Hambleton  and  Swaminathan  (1985, 
Chapter  12).  A  latent  model  for  {U^,  i  >  1}  is  denoted  by  {Uj^,  0,  N  >  1}  or 
more  completely  by  {U^j,  0,  P(uj(|^),  F(f),  N  >  1},  thus  emphasizing  that  adding 
successive  items  from  the  item  pool  generates  a  sequence  of  tests. 

It  will  be  assumed  throughout  the  remainder  of  the  paper  that  U^  consists  of 
the  first  N  items  of  an  infinite  item  test  {Up  i  >  1}.  This  will  be  referred  to 
as  the  infinite  item  pool  formulation  of  lET.  This  proposed  replacement  of  Ujj  for 
fixed  N  by  {Up  i  >  1}  in  IHT  modeling  is  a  specific  instance  of  a  standard  and 
useful  modeling  device  used  throughout  mathematical  statistics.  For  example,  in 

I  
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order  to  study  the  performance  of  estimating  a  population  mean  by  the  sample  average, 
an  infinite  random  sample  {X^,  i  >  1}  is  often  posited,  thus  enabling  one  to 
examine  the  asymptotic  properties  of  Ijj  =  I^/N  as  N  -•  *. 

Our  modeling  approach  deliberately  tolerates  the  test  (and  hence  the 

infinite  item  pool)  containing  an  insignificant  number  of  atypical  items.  For 

example  every  2  -th  item  of  a  "mathematics"  item  pool  could  be  a  verbal  item.  The 

allowing  of  an  "insignificant  number"  of  atypical  items  is  facilitated  by  the 

introduction  of  the  concept  of  a  collection  of  nonsparse  subtests:  Consider  the 

sequence  of  tests  {Q.]||,N  >  1}  obtained  from  the  item  pool  {ll^,i  >  1}  by  iterating 

%+l  ^  ^  subtest  of  will  be  denoted  by  =  {U^  ,D^  } 

1  2 

for  each  N  >  1.  Thus  1(N)  <  N  denotes  the  length  of  the  subtest  A 

particular  collection  of  subtests  >  1}  is  termed  nonsoarse  if  it  is  nested 

for  N  >  1;  that  is,  all  items  of  are  also  in  and  there 

exists  6  >  0  such  that 

(4) 

for  all  N  >  1.  That  is,  (4)  requires  that  the  length  of  each  subtest  of  the 
collection  must  exceed  a  fixed  possibly  small  proportion  of  the  length  N  of  the 
corresponding  test  loughly,  a  test  is  two  dimensional  if  it  has  a  subtest 

measuring  a  second  ability  different  than  the  latent  ability  of  interest.  For  our 
infinite  item  pool  formulation,  this  translates  into  a  sequence  of  subtests 
>  1}  measuring  a  this  second  ability.  The  theory  of  essential 
unidimensionality  then  requires  that  these  subtests  be  nonsparse  and  hence  typical  of 
the  infinite  item  pool. 

Ve  next  define  a  weaker  type  of  independence  than  local  independence  called 
essential  independence.  The  intuitive  idea  is  that  conditional  on  the  latent  random 
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variable,  the  covariances  between  items  are  "small"  on  average,  regardless  of  which 
collection  of  non'^parse  subtests  is  being  used. 

Definition  2.2.  The  latent  model  {Djj,  0,  N  >  1}  is  said  to  be  essentially 
independent  (El)  if  for  every  collection  of  nonsparse  subtests  for  each  6  in  the 
range  of  0, 

»((«)  =  L,j€l,,i<j  "’C’i.'jie  =  i)  -  0  as  H  -  (5) 

Remarks,  (i)  From  a  measure-theoretic  probability  perspective,  ve  mean  by 
N  >  1}  that  the  random  variables  {U^,  i  >  1}  and  the  random  vector  6  are 
defined  on  a  common  probability  space.  In  this  paper  issues  of  measure-theoretic 
probability  rigor,  although  always  surmountable,  are  suppressed  in  the  interest  of 
clarity. 

(ii)  The  content  of  (5)  is  that  cov  (D.,O.|0  =  £)  must  be  small  on  average 
for  a  wide  class  of  subtests,  the  collections  of  nonsparse  subtests.  For  example 
taking  an  El  latent  model  must  satisfy  for  each  $  in  the  range  of  0 

N 
2 

as  N  -*  B. 


(iii)  Cov(0^,Uj|fi  =  i)  /  0  holds  if  and  only  if  there  is  latent  information 
beyond  knowing  Q  =  i  that  influences  examinee  performance  on  the  item  pair 

(Uj,0.). 

(iv)  Observe  that  a  sufficient  condition  implying  (5)  for  all  collections  of 
nonsparse  subtests  is  that  for  each  $ 


-♦  0  as  N  -♦  B. 


(7) 


It  is  informative  to  contrast  the  definition  of  essential  independence  given  by 
(5)  with  the  traditional  latent  trait  conceptualization  of  local  independence  given 
in  (2).  LI  implies  independence  of  all  pairs  (D^,Dj),  given  which  is 
equivalent  to  cov(UpD-lQ  =  i)  =  0  for  all  Q.  By  contrast,  El  is  a  weaker 
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assumption  than  LI  and  only  requires  that  for  each  fixed  0,  cov(U^,lJj|S  =  9)  is 
small  in  magnitude  on  average  as  the  test  length  N  grows.  The  psychometric 
interpretation  of  El  is  that  g  measures  those  individual  examinee  differences 
that  are  essential  or  dominant  in  influencing  item  pool  performance.  Vhereas,  6 
must  be  augmented  to  1®  order  that  measure  all  individual 

differences  that  influence  any  of  the  items  of  the  item  pool;  that  is  LI  holds  for 
>  1}.  here  consists  of  dimensions  that  have  an  inessential  or  minor 
influence.  For  example  a  component  of  might  influence  examinee  performance  on 
only  one  item. 

From  the  mathematical  viewpoint,  it  is  not  known  whether  it  is  possible  to 
construct  a  latent  model  for  which  essential  independence  holds  (i.e.,  (5)  for 
nonsparse  subtests)  and  yet  (7)  fails.  Indeed,  we  conjecture  but  have  not  proved  yet 
that  under  a  very  mild  hypothesis  (5)  does  imply  (7).  However,  from  the  applied 
psychometric  perspective,  it  is  very  plausible  that  (5)  implie?  (7)  in  all  realistic 
IRT  models  of  useful  tests:  To  see  this,  suppose  (7)  fails.  For  simplicity,  suppose 
that  (7)  fails  in  the  sense  that  there  exists  e  >  0  and  some  value  i  of  the  latent 
variable  Q  for  which  Djj(i)  >  e  for  ail  N,  rather  than  the  technically  connect 
infinitely  many  H.  Fix  such  a  i.  Because  |cov(BpOJ^)|  <  1  for  all  (i,j) 
pairs  and  Dj^(i)  >  e  for  all  N,  it  follows  that  for  each  N  there  must  exist  many 
item  pairs  (B.,D.),  1  <  /  <  j  <  N  for  which 

«l 

lcov((ij,ii.ia  =  ffli  >1 

Thus  it  ia  plausible  that  there  exists  a  collection  >  1}  of  nonsparse 

subtests  for  which  the  above  inequality  holds  for  all  item  pairs  of  the  collection. 
Let,  recalling  the  preceeding  paragraph,  be  the  augmentation  of  S 

that  produces  LI  for  {A^,N  >  1}.  Item  pairs  from  a  set  of  items  all  with  similar 
latent  factor  loadings  on  the  components  of  Qj^j  ("similar  latent  factor  loadings" 
to  be  informally  interpreted)  tend  to  be  non-negatively  correlated  conditional  on  a 
specific  value  of  the  latent  vector  S.  Thus,  it  seems  plausible  that  one  can 
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extract  froa  >  1}  a  collection  of  nonsparse  subtest  >  1};  i.e.,  c 

for  all  N,  for  which  cov(U^,Dj|fi  =  i)  >  j 
for  all  iteas  of  [M^fV  >  l}-  That  is  (5)  fails  for  a  collection  of  nonsparse 
subtests.  Thus,  it  is,  as  claiaed,  very  plausible  that  (5)  iaples  (7). 

Because  it  is  plausible  that  (5)  and  (7)  are  equivalent  and  because  (7)  is  so 
Buch  easier  to  interpret,  it  is  often  useful  to  interpret  El  using  (7)  instead  of 
(5) — ^the  reader  is  encouraged  to  do  so  when  appropriate.  Also,  when  it  is  necessary 
to  prove  Batheaatically  that  El  holds,  often  this  is  done  by  proving  that  (7)  holds 
rather  than  (5). 

Just  as  LI  can  be  weakened  to  El  aonotonicity  can  be  weakened  to  to  weak 
Bonotonicity: 

Definition  2.3.  {Q.}f,S,N  >  1}  is  said  to  be  weakly  aonotone  (VI)  if 


lx 


is  nondecreasing  in  f 


(8) 


for  all  N. 

In  aost  of  the  results  of  this  paper,  it  will  suffice  to  assuae  VI  instead  of  1. 
Note  that  VI  does  allow  for  soae  nonaonotone  IKFs  in  the  itea  pool. 

Now  essential  diaensionality  can  be  defined. 

Definition  2.4.  The  essential  diaensionality  dg  of  a  test  {U^,  i  >  1}  is  the 
ainiaal  diaensionality  required  for  a  latent  trait  3  to  aake  the  latent  aodel 
{fiR,  B,  »  an  El,  VI  aodel.  Vhen  dg  =  1,  essential  unidiaensionalitv  is 

said  to  hold.  If  essential  dg  diaensionality  holds  using  ability  3  then 
{Up  i  >  1}  is  said  to  be  essentially  dg  diaensional  with  respect  to  ability  £. 
Such  a  trait  is  called  an  essential  trait  for  {Up  i  >  1}. 

leaarks.  (i)  Although  dg  =  0  is  theoretically  possible,  it  is  psychoaetrically 
uninteresting  and  does  not  occur  in  well  designed  tests.  Thus,  to  avoid  irrelevant 
trivialities  we  assuae  dg  >  1  throughout  this  paper. 

(ii)  It  is  vital  to  note  that  the  essential  diaensionality  (traditional 
diaensionality  too)  depends  siaultaneously  on  both  the  infinite  itea  pool  and  the 
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examinee  population.  The  dimensionality  of  a  test  administration  is  determined  by 
the  interaction  between  items  and  examinees.  The  dimensionality  of  a  test 
administration  cannot  be  assigned  without  consideration  of  the  examinee  population  as 
well  as  the  test  items.  Nonetheless  it  is  a  linguistic  convenience  to  refer  to  the 
"test  dimensionality." 

The  theorems  and  examples  in  the  remainder  of  Section  2  combine  to  produce  a 
partial  taxonomy  of  infinite  item  pools  ^  which  essential 

unidimensionality  holds.  Many  variations  and  combinations  of  these  theorems  and 
examples  are  easily  derivable.  In  essence,  it  is  shown  that  essential 
unidimensionality  holds  for  an  item  pool  {D^,  i  >  1}  if 

(i)  only  a  "nondense"  subsequence  of  the  items  depends  on  an  ability  (or 
abilities)  other  than  the  ability  of  interest, 

(ii)  each  ability  other  than  the  ability  of  interest  influences  at  most  K  <  m 
items  and  moreover  these  incidental  abilities  are  "orthogonal"  to  each  other, 
conditional  on  the  ability  of  interest,  or 

(iii)  the  magnitude  of  the  dependence  of  items  on  an  ability  (or  abilities) 
other  than  the  ability  of  interest  is  asymptotically  negligible,  even  though  a 
"dense"  set  of  items  of  the  test  may  depend  on  this  other  ability. 

Note  that  in  each  case  (i)  -  (iii)  it  is  intuitively  clear  that  there  is  one 
dominant  dimension  with  possibly  many  minor  dimensions;  that  is,  essential 
unidimensionality  clearly  ought  to  hold. 

Recall  that  we  propose  using  {0^  i  >  1}  as  a  model  for  a  given  finite  item 
test.  From  this  viewpoint  (i)  -  (iii)  holding  for  {Dj^,  i  >  1}  translates  into 
(i')  -  (iii')  respectively  holding  for  the  finite  test  Uj^: 

(i')  Few  items  of  depend  on  an  ability  (or  abilities)  other  than  the 
ability  of  interest, 
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(ii')  each  ability  other  than  the  ability  of  interest  influences  at  most  a 
small  number  of  items  of  ^  and  moreover  these  incidental  abilities  are 
"orthogonal"  to  each  other,  conditional  on  the  ability  of  interest, 

(iii')  the  magnitude  of  the  dependence  of  the  items  of  on  an  ability  (or 
abilities)  other  than  the  ability  of  interest  is  small,  even  though  most  of  the  items 
may  depend  on  this  other  ability. 

The  purpose  of  the  statistical  test  proposed  in  Stout  (1987)  is  to  assess  for  a 

test  ^  administered  to  a  population  of  examinees  whether  essential 

unidimensionality  provides  a  good  data  fit — ^for  example,  because  (i')  (ii')  or  (iii') 

holds.  In  Section  4  a  model  for  test  item  construction  is  proposed,  making  explicit 

how  we  view  the  dimensional  character  of  ^  as  being  the  same  as  that  of  {U^,  i  > 

1}.  This  helps  demonstrate  the  close  relationship  between  (i)  and  (i'),  between  (ii) 

and  (ii'),  and  between  (iii)  and  (iii'). 

The  basic  thrust  of  essential  unidimensionality  (recall  Definition  2.4)  is  that 

the  same  unidimensional  latent  trait  9  makes  the  average  conditional  covariances 

of  item  pairs  small  for  ftll  collections  of  nonsparse  subtests.  This  will  hold  even 

if  a  "nondense"  subsequence  of  items  in  the  infinite  item  test  depend  on  an  ability 

other  than  that  intended  to  be  measured,  as  the  following  example  illustrates. 

1. 

Example  2.1.  Assume  VI.  Suppose  every  2  th  item  is  a  pure  verbal  (62)  item  and 
the  rest  are  pure  mathematics  (0j^)  items,  with  mathematics  the  ability  intended  to 
be  measured.  Thus  we  suppose  for  all  that 

cov(lIpDj|ej  =  0j)  =  0 

for  all  pairs  (U^D-)  of  mathematics  items.  An  easy  calculation  shows  that  (7)  and 

«# 

hence  that  (5)  holds;  i.e.,  essential  unidimensionality  holds.  □ 

The  above  example  is  easily  abstracted  into  a  conceptually  useful  theorem. 

i. 

First,  a  set  of  indices  {ij  <  i2  <  •••}  is  said  to  be  nondense  if  - •  ®  as 

k  -»  OD. 
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Theorem  2.1.  Assume  VM.  Let  A  =  denote  a  nondense  set  of  indices. 

Suppose  for  an  item  pool  {U^,  i  >  1}  there  exists  a  unidimensional  latent  ability 
0  for  which  for  all  0 


sup 

i^A,j^A,|i-jj>N 


cov  (D.,Uj.|e  =  0) 


-*  0 


as  N  -4  GD.  Then  essential  unidimensionality  with  respect  to  0  holds. 

Remark .  The  hypothesis  of  Theorem  2.1  states  except  for  item  pairs  coming  from  a 
nondense  sequence  of  items,  that  the  magnitude  of  covariances,  conditional  on  0 
asymptotically  approaches  0.  For  example,  it  might  be  that  items  with  indices  in  A 
depend  on  other  dimensions  besides  0. 

Proof.  Fix  0,  Fix  1  >  e  >  0.  Let  Num  (B)  denote  the  cardinality  of  any  set  B. 
Let  Ajj  =  {ij^  in  A  such  that  ij^  <  N}.  Choose  Nq  such  that 


sup 

i/rA,j^A,|i-j|  >  Nq 


cov(O.,O.|0  =  0) 


Num(Aj,) 
<  e,  - ^ - 


<  e 


for  all  N  >  Nq.  Split  item  pairs  from  {Op  1  <  i  <  N}  up  into  those  for  which  at 
least  one  item  is  in  Aj^;  those  for  which  |i-j|  <  Nq  and  both  i  ^  Ajj,  j  {^  Aj^; 
and  those  for  which  |i-j|  >  Nq  and  both  i  ^  Ajj,  j  ^  Ajj.  Then,  for  N  >  Nq,  noting 
that  |cov(DpO.  |0  =0)1  <1  for  all  i,  j,  0, 

■J 

<  eN^  +  NNq  +  cN^. 

Then,  Dj^{0)  <  3e  for  N  large,  thus  establishing  (7)  and  hence  establishing 
essential  unidimensionality.  □ 

The  following  example  suggests  how  essential  unidimensionality  can  fail  when  two 
psychometric  dimensions  are  present. 

Example  2.2.  Modify  Example  2.1  by  making  every  lOk-th  item  a  verbal  item.  Assume 
there  exists  «  >  0  and  0^  such  that  cov(Up  =  0^)  ^  f  for  S’!!  pairs  of 

verbal  items.  Consider  N  =  10k;  k  =  1,  2,-**.  Then,  it  is  easily  seen  that 
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0  as  N  B  and  hence  that  (5)  does  not  hold  for  all  collections  of 
nonsparse  verbal  subtests.  The  point  is  that  {®io’  really  is  measuring  a 

different  dimension  (verbal)  from  the  intended  dimension  to  be  measured 
(mathematics).  Hence  the  scaling  empirical  verbal  scale, 

^2i+l^^  is  an  empirical  mathematics  scale,  and  U^/N  is  a  combined  mathematics 

and  verbal  scale. 

□ 

The  following  commonly  occuring  test  setting  illustrates  that  essential 
dimensionality  (Definition  2.4)  and  traditional  dimensionality  (Definition  2.1)  can 
differ  considerably.  It  also  illustrates  a  setting  where  essential  unidimensionality 
is  the  result  of  (ii). 

Example  2.3.  Consider  the  construction  of  a  paragraph  comprehension  test  of  length 
N  =  5n,  where  n  =  number  of  paragraphs  and  each  paragraph  is  followed  by  five 
related  questions.  Assume  total  independence  between  questions  involving  different 
paragraphs  given  0,  where  for  convenience  we  think  of  0  as  reading  ability. 
Suppose  VM  with  respect  to  ®.  Note,  using  lcov(DpDjl0  =  0)  |  <1  for  all  i,  j, 

9  that 

-  11(11  -  IjfT"]  (  2  ]  -  “  “  "  - 

Thus  essential  unidimensionality  holds,  whereas  a  traditional  dimensionality  of 
n  +  1  seems  necessary  for  a  test  of  length  N  =  5n.  Heading  ability  0  is  the 
essential  trait  for  this  essentially  unidimensional  model.  □ 

The  example,  by  displaying  a  test  that  clearly  should  be  psychometrically 
labeled  as  "unidimensional",  illustrates  our  view  that  minor  or  idiosyncratic 
dimensions  should  be  ignored  in  assessing  test  dimensionality  from  the  applications 
viewpoint.  Our  requiring  El  rather  than  II  is  the  key  step  that  makes  it  possible  to 
ignore  minor  dimensions  in  assessing  dimensionality. 
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The  following  theorem  makes  (ii)  precise  and  hence  demonstrates  one  way  in  which 
essential  unidimensionality  holds. 

Theorem  2.2.  let  {U^,  i  >  1}  be  given.  Suppose  that  local  independence  holds  with 
respect  to  9=  Suppose  conditional  on  9  that  (0^,02, •••)  are 

mutually  independent.  Suppose  that  each  9^  influences  at  most  I  items  and  that 
each  item  depends  on  at  most  K  ^^s.  Suppose  VI  holds  for  >  !}•  Then 

essential  unidimensionality  holds  with  respect  to  0. 

Proof.  It  suffices  to  prove  (7).  Consider  f  »  a  splitting  of 

9  up  into  four  subsets.  Consider  C^,  Dj  for  which  0^  depends  on  (^,  and 

U-  depends  on  (9,  with  no  dependence  on  9^^\  Then,  using  the  standard 

calculus  for  conditional  probabilities, 

cov(Dp  D.|0)  =  E[cov(O.,Dj|0,0(^^0(^^)|0]  +  cov[E(U J0,0(^\0(^)) , 

E(Oj|e,0^^),0(2))|0]  =  cov[E(O.|0,0(^),0(^)),  E(Uj|0,©^^^0^^b|0] 

Here  we  have  used  the  fact  that  by  local  independence  and  the  nondependence  on 

0  =  cov(U.,U.|fi)  =  cov(UpU.|0,0(^),0(2)) 
cov[E(U.  |0,0^^) ,  E(Uj  |©,0^^^ I©] 

=  E[E(Uj0,©(^\0(^))E(Uj|©,0(^),0(^))|0]  - 

E[E(II.l0,0(^^0(2))l0]E[E[(Uj|0,©^^^0(^))|0] 

=  E[E(Cj0,0(^),0^2))|0]E[E{Oj|0,0^^),©(^))|©]  - 

E[E(U.|0,0(^\0(2))|0]E[E(U.|0,0(^^©(^^)|©] 


Now, 


0. 


The  above  factoring  was  allowed  because  9^^^  and  0^^^  are  independent  given  0  by 
hypothesis.  Thus,  we  have  proved  that  when  depends  on  (0,  0^^^)  only  and  Uj 
depends  on  (0,9^^^)  only  that 


Recall  that 


cov(UpUj  1©)  =  0. 


cov(U.,Oj|0) 


(9) 


<  1 


for  arbitrary  i,  j.  Thus,  using  (9)  an  upper  bound  for  Dj^(0)  of  (7)  is  given  by 
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where  C  is  the  number  of  item  pairs  with  indices  <  N  for  which  both  items  depend  on 
at  least  one  common  0^.  Consider  a  fixed  item  with  index  <  N.  It  depends  on  at 
most  K  ^^s.  For  each  such  0^  there  are  at  most  L  items  also  dependent  on  0^. 
Thus  the  total  items  sharing  at  least  one  common  0^  is  bounded  above  by  LK.  Thus 
the  total  number  of  item  pairs  C  is  bounded  above  by  LKN.  I.e., 


1 

T 


[IT 


r  '  2  LKN  . 

^  -  ■N(N  -  1)  ■*  ® 


as  N  -t  OD,  as  desired.  Since  VI  holds  by  hypothesis,  the  theorem  is  proved. 

□ 

« 

Remark .  The  mathematical  assumption  of  LI  with  respect  to  i  in  Theorem  2.2 
simply  means  from  the  psychometric  viewpoint  that  examinee  performance  is  completely 
explained  by  i 

Theorem  2.3  now  makes  explicit  one  way  in  which  (iii)  above  implies  essential 
unidimensionality. 

Theorem  2.3.  Let  {D^,  i  >  1}  be  given.  Suppose  that  local  independence  holds  with 
respect  to  f 

ej(«)  =  sup  -  inf 

«(1)  0W 


Suppose  that  the  dependence  of  itens  on  is  asyaptotically  negligible  in  the 

sense  that  the  IRFs  Pj(f)  satisfy  for  every  0 

T  Ii=l  ^i(^)  ^ 

as  N  -*  OD.  Suppose  VI  holds  for  {Cjj,0,N  >  1}.  Then  essential  unidimensionality 
holds  with  respect  to  0. 

Proof .  It  suffices  to  prove  (7).  For  an  arbitrary  pair  of  items  (U^,  U.),  using 
standard  probability  calculus,  by  the  assumption  of  local  independence, 

coy(Uj,ii.|e)|  .  |E[coy(jj,Bp|e)e]  *  coT[E(iije),  E(ii.|e)|e] 

=  |coy[E(lIj|e),  E(llj|6)|e]|. 

Since  coy{I,y)^  $  Var(X)  Var(Y)  holds  in  general. 
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|cov[E(iii|e),  E(ti^|e)|e]| 

<  ¥ar‘/^[E(Oj|e)|e]  Var‘/^[E(l|.|e)|e] 
=  Varl/2[Pj(6)|e]  Varl/2[Pj(9)|9]. 

Combining, 

lcov(D., 0.10)1  <  Var^/2^P.(^|e]  Var^/^^Pj (0)  1 0] . 
Since  for  a^  >  0,  ^  a^aj  <  ^i]^»  follows  that,  using  (11), 


(11) 


in 


r  N 


TTT  Z 
I  2  l<i<j<N 


I  |cov(O.,Oj|0)|  <  ^  I  Varl/2[P.(0|e)] 
i<i<N  2  U=1 


Because  0  <  a  <  X  <  b  implies  Var  I  <  (b  -  a)^,  it  follows  that 


1 

1  |cov(Uj,iij|e)|  <  1 

1 

K 

■  III 

2  J 

l<i<j<N 

2 

- 

Thus  (7)  follows  by  (10).  a 

3.  Application  to  Consistent  Estimation  of  Ability.  Ve  turn  now  to  the  problem  of 
estimating  a  particular  latent  ability  0  of  interest  in  the  presence  of  other 
("nuisance")  abilities.  The  act  of  specifying  an  HI  model  usually  includes  the 
choice  of  a  specific  latent  ability  scale  i  and  hence  a  specific  scale  for  the 
ability  of  interest  0.  Throughout  this  paper  ”0  scale"  will  refer  to  the  scale 
for  the  ability  of  interest  that  is  predetermined  by  the  specific  I&T  model  chosen. 
The  following  dichotomy  holds  for  ability  estimation:  (i)  Because  the  0  scale  has 
been  established  by  its  use  in  other  test  settings,  because  the  0  scale  is  linked 
to  some  external  criterion,  or  because  the  0  scale  has  an  intrinsic  theoretical 
justification,  etc.,  one  may  insist  on  estimating  the  ability  of  interest  using  the 
0  scale;  or  (ii)  nothing  about  the  test  setting  makes  the  0  scale  or  any  other 
scale  particularly  preferable  to  use.  That  is,  in  the  case  of  (ii),  any  strictly 
increasing  transformation  A(^)  yields  an  equally  acceptable  scale  for  the  purpose 
of  estimating  ability.  In  Case  (ii),  it  is  thus  appropriate  to  choose,  for 
statistical  convenience  perhaps,  a  particular  scaling  k{0)  on  the  real  line  with 
apparent  interval  scaling  properties  as  long  as  inferences  drawn  depend  only  on  the 
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ordinal  nature  of  the  real  line.  The  point  is  well  discussed  in  Section  1.6  of  Lord 
and  Novick,  1968.  In  summary  Lord  and  Novick  state  "A  major  problem  of  mental  test 
theory  is  to  determine  a  good  interval  scaling  to  impose  when  the  supporting  theory 
implies  only  ordinal  properties."  For  example,  L06IST  and  BIL06,  likely  the  two 
most  commonly  used  ability  estimation  programs,  both  create  such  an  "interval" 
ability  scale.  Mislevy  (1987)  stresses  that  from  a  pure  IRT  model  fit  viewpoint 
(that  is,  without  extraneous  requirements  such  as  Rasch’s  specific  objectively),  only 
ordinal  scaling  properties  are  defensible.  Ve  will  refer  in  this  paper  to  the 
creation  of  a  convenient  ability  scale  in  the  absence  of  prior  scaling  constraints  as 
ordinal  scaling. 

The  modeling  framework  we  adopt  to  investigate  ability  estimation  is  the  VM,  El 
infinite  item  pool  framework  {Djj,  0,  N  >  1}  introduced  in  Section  2.  It  is 
assumed  that  the  ability  of  interest  0  is  determined  by  0;  that  is,  that  0  is 
a  function  of  0.  Mathematically  speaking,  there  may  well  exist  item  pools 
{Up  A  ^  1}  ^or  which  dg  =  o.  But,  psychometrically  this  is  unrealistic:  Recall 
that  it  is  assumed  that  the  {U^,  i  >  N}  is  constructed  by  continuing  the  same 
process  that  produced  ^  and  thus  that  {U^^,  i  >  1}  will  be  of  the  same 
dimensional  character  as  Taking  this  into  consideration,  from  the  psychometric 

viewpoint  it  is  clear  for  virtually  all  tests  to  be  modeled  that  it  is  realistic  to 
assume  at  most  a  finite  number  of  dominant  latent  dimensions.  Thus  we  assume  for  our 
infinite  item  pool  framework  that  {Ujj,  ©,  N  >  1}  has  dg  <  od  with  the  assurance 
that  this  assumption  is  not  psychometrically  restrictive. 

In  order  to  illuminate  certain  theoretical  issues  most  clearly,  in  Section  3.1 
ability  estimation  will  be  considered  first  in  its  simplest  setting,  namely  where 
only  ordinal  scaling  (recall  (ii)  above)  is  required.  Then,  in  Section  3.2  we 
consider  ability  estimation  for  the  perhaps  more  useful  case  where  use  of  the  0 
scale  is  desirable  or  required. 
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3.1  Estimatiop  of  ability  in  the  ordinal  scaling  case.  Ve  first  select  a  natural 
scale  for  the  ability  d  of  interest.  Let 

Pi(<?)  =  E[p.(e)|e  =  0]  (12) 

where  P^(d)  is  the  ith  itea  response  function  of  the  latent  model  {Uj^,  6,  N  >  1}, 
0  is  the  random  latent  variable  of  interest,  and  the  conditional  expectation  in 
(12)  is  with  respect  to  the  conditional  distribution  of  0,  ^iven  6=0. 

The  following  probabilistic  calculation  justifies  calling  P^(^)  in  (12)  the 
ith  marginal  item  response  function  with  respect  to  ability  0:  By  definition 

Pj(i)  .  p(iij  =  i|e  = 

Thus 

E{p.(e)|e  = «}  =  E{p[Uj  =  i|fi]|e  =  «) 

=  p[Uj  =  lie  =  (], 

this  last  equality  holding  since  expectation  is  a  "projection  operator."  Ve  have 
thus  proved  that  for  all  0, 

p.(0)  =  P[0.  =110=^0.  (13) 

That  is,  P^(^),  while  defined  as  the  probabilistic  average  of  getting  Item  i  correct 
with  averaging  over  all  0  for  which  0  =  ^,  is  also  an  item  response  function  in 
the  ordinary  sense  of  being  the  probability  of  getting  Item  i  correct  for  a  randomly 
sample  examinee  of  ability  6=0. 

Let 

Ajj(^)  is  called  the  intrinsic  ability  scale  for  0  relative  to  the  test  Ujj  and 
the  examinee  population  0.  Ajf(0)  has  an  interpretation  bridging  classical  test 
theory  and  IRT:  Ajj(fl)  is  the  expected  test  score,  that  is,  true  score,  among  all 
examinees  with  latent  ability  0.  Under  the  assumption  of  strict  monotonicity  for 
2i_jPi(i)/N  of  the  latent  model,  a  slight  modification  of  Theorem  3.1  below  implies 
that  ijf(0)  is  strictly  increasing  in  0  and  hence  is  an  acceptable  scale  for 
estimating  the  ability  of  interest  0  in  the  ordinal  scaling  case.  As  such,  Aj^(^) 


for  fixed  N  is  our  reconunended  choice  of  scale  in  the  ordinal  scaling  case.  It 
helps  to  recall  that  the  assumption  of  VM  is  that  2i=l**i(-^  monotone. 

Considerable  recent  attention  has  been  focused  on  nonmonotone  unidimensional 

item  response  functions.  It  has  been  shown  that  "attractive  distractors"  are  a 
source  of  nonmonotonicity  in  multiple  choice  items.  It  has  been  suggested  that  the 
existence  of  attractive  distractors  may  be  explainable  by  multidimensionality  of  the 
ability  space.  In  this  regard,  it  is  interesting  to  note  that  a  multidimensional 
item  response  function  P(f)  can  be  monotone  and  yet  the  corresponding  marginal  item 
response  function  P(d)  be  nonmonotone: 

Example  3.1.  let  P(dj,  0^)  =  (^i  +  <  1.  0  <  ^2  - 

conditional  distribution  of  ©j  given  6^  =  0^  by  f{02\0^)  =  ^j/4  if  0  <  ^2  - 

4/tfj;  =  0  otherwise. 

Then  the  marginal  item  response  function  with  respect  to  0^  is  giVen  by 

n^)  - 


■4/(?j 

•  tfl  +  ^2  ■ 

0 

[ . .  SS  "  J 

0^602 


1 ' 


Tf-  1/4  <  <  1. 


But  P(^j)  is  decreasing  in  0^  for  all  0^.  □ 

Of  course,  as  is  intuitively  clear,  mild  and  natural  regularity  conditions 
preclude  this  nonmonotone  behavior.  Indeed,  the  nonmonotonicity  of  marginal  item 
response  function  can  occur  only  when  the  multidimensional  ability  0  has  some  sort 
of  negative  association  among  its  components,  as  Theorem  3.1  below  makes  clear  by 
setting  N  =  1. 

Definition  3.1.  A  random  vector  Y  is  said  to  be  stochastically  larger  than  a 
random  vector  I  if,  for  all  t. 


P[I  >  11  <  P[I  >  H 


with  strict  inequality  for  at  least  one  t. 
The  following  fact  is  well  known: 


(15) 
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Le—a  3.1.  let  Y  be  stochastically  larger  than  I,  and  let  f  be  a  nonnegative 
real  valued  function  that  is  nondecreasing  in  each  of  it  argunents.  Then 

Ef(I)  <  E£(I) 

Theorea  3.1.  Let  latent  aodel  (Ujj,  ^  be  aonotone.  Let  0  = 

(0,  0^^^).  Suppose  that,  for  every  6'  <  O'  pair  of  real  nuabers  that  the 
distribution  of  0^^^  given  0  =  is  stochastically  larger  than  the  distribution 
of  e(2)  given  0  =  0'.  Then,  i|y(0)  is  aonotone. 

Proof .  It  aust  be  shown  for  each  i  that 

is  nondecreasing  in  0  where  P(0^^^|0)  denotes  the  distribution  of  0^^^,  given 
0=0.  Fix  real  nuabers  0'  <  0*.  By  Leaaa  3.1,  noting  that  is 

for  fixed  0  a  nondecreasing  function  of  0^^^  by  the  assuaption  of  aonotonicity, 

1  -  I  Ii=l^i(^'»i^^^)‘^*‘(/^V')-  (16) 

But,  because  s  P^(0,  0^^^)  is  nondecreasing  in  0  for  each  fixed  §2* 

1  <  I  (17) 

The  coabination  of  (16)  and  (17)  yields  the  desired  result.  □ 

Beaark.  If  a  LI  aodel  (Sufft)  has  Bonotone  then  Theorea  3.1  aakes 

clear  that  assuaing  ^(^)  aonotone  is  a  aild  assuaption. 

Ve  now  turn  directly  to  the  ability  estiaation  problea  in  the  ordinal  scaling 
case.  As  we  shall  see,  essential  unidiaensionality  characterizes  the  consistent 
estiaation  of  the  unidiaensional  latent  ability  on  soae  scale;  aoreover,  it  iaplies, 
froa  the  ordinal  scaling  viewpoint,  that  the  latent  ability  is  unique.  It  is  in  this 
spirit  that  an  essential  trait  (recall  Definition  2.4)  can  be  referred  to  as  L]i£ 
essential  trait  with  respect  to  which  the  iteas  are  essentially  unidiaensional. 

Theorea  3.2  below  asserts  that  essential  unidiaensionality  is  precisely  the 
condition  needed  for  consistent  estiaation  of  ability.  Before  stating  Theorea  3.2, 
we  aust  carefully  state  what  it  aeans  to  consistently  estiaate  ability  using  an 
infinite  itea  pool  foraulation.  lecall  our  ordinal  scaling  viewpoint  that  any 
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Strictly  Monotone  transformation  oi  6  —  for  example  which  is  strictly 

monotone  when  the  marginal  ItFs  are  —  is  an  acceptable  scale  on  which  to 
estimate  0. 

Definition  3.2.  (i)  It  is  said  that  0  may  be  consistently  estimated  (in 

probability)  if  for  every  collection  of  nonsparse  subtests,  for  each  0,  given 


0  =  ^ 


(18) 


in  probability  as  N  -•  od. 

lemarks.  Specialized  to  the  present  setting,,  the  traditional  statistical  definition 
of  consistency  applies  separately  to  each  estimator  ^(Hjy)  of  0  as  follows: 
^(^)  is  a  consistent  estimator  of  0  if  for  all  0,  given  Q  =  0, 

..  0 


in  probability  as  N  a.  By  contrast,  our  psychometric  notion  of  item  pool 
consistency  proposed  by  Definition  3.2  is:  (i)  stronger  than  the  traditional 
definition  in  that  convergence  is  required  simultaneously  for  all  collections  of 
nonsparse  subtests  ;  (ii)  weaker  in  the  sense  that  the  required  convergence  for  each 
scaling  -  Ijli^^i./*  (N)}  is  not  to  0  but  rather  to  a  convenient  rescaling 


of  0  that  varies  with  {^(^)},  and  hence  the  psychometric  notion  of  consistency 
is  an  ordinal  scaling  concept;  and  (iii)  different  in  the  sense  that  statistical 
consistency  is  a  property  that  each  estimator  {^(Uj^),  N  >  1}  either  has  or  doesn’t 
have  while  psychometric  consistency  is  a  property  that  the  infinite  item  pool  (that 
is  the  sequence  {D^,  i  >  1})  either  has  or  doesn’t  have. 

The  intuitive  idea  of  Definition  3.2  is  that  any  reasonable  collection  of 
subtests  should  be  able  to  estimate  0  in  our  ordinal  scaling  sense.  Not  all 
collections  of  subtests  need  necessarily  be  usable  to  estimate  0  in  the  ordinal 
scaling  sense.  For  example  if  every  2  th  item  in  a  "mathematics"  test  were  a 

.  "verbal"  item,  then  6  ^  s  D  ^/M  would  estimate  verbal  ability.  But  this  is 

2  2 


developed  from  a  nondense  collection  of  subtests  and  hence  is  not  required  to 
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asymptotically  estimate  0  for  consistency  to  hold.  However,  if  every  10th  item 
v^re  a  verbal  item,  then  Definition  3.2  implies  that  consistency  does  not  hold 
because  obviously  the  corresponding  subtests  are  nonsparse. 

A  reasonable  question  to  ask  is  whether  in  formulating  our  definition  of  test 
consistency,  it  would  suffice  to  merely  require  for  each  0  that,  given  Q  =  9, 

in  probability  as  N  od  instead  of  requiring  (18).  However  merely  requiring  (19) 
is  vacuous;  Any  test  of  fixed  length  N  can  be  appropriately  embedded  in  an 

essentially  unidimensional  sequence  of  tests  0,  N  >  1}  for  which  (19)  holds 

for  a  judicious  choice  of  0.  The  following  example  (suggested  by  remarks  of  D.  H. 
Divgi)  illustrates  this  embedding  for  a  test  where  SOX  of  the  items  measure  one  trait 
and  SOX  measure  a  second  trait.  It  presents  an  essentially  two  dimensional  infinite 
item  pool  where  (19)  satisfied  for  a  mathematically  judicious  choice  of  0,  9 
being  some  function  of  the  dimensions  (0^,02)*  a  situation  where  most 

psychometricians  would  prefer  to  split  the  test  up  into  two  unidimensional  tests  and 
only  then  address  the  issue  of  consistency.  Hequiring  (18)  instead  of  (19)  as  a 
definition  of  consistency  reflects  this  consideration. 

Example  3.2.  Let  >1}  be  a  II,  I  latent  model  with  f  =  {0i,02)  &i>d 

for  i  >  1, 

fiiU)  =  «i,  =  »2 

where  the  distribution  of  Q  is  given  by  6^  @2  independent  identically 
distributed  with  0j  uniformly  distributed  on  [0,1].  Let  0  =  0j  +  02-  Fix 
0  <  1.  Then,  standard  multivariable  calculus  yields  for  1  <  i,  j;  1  <  k 

=  «)  .  I  ,  coy((ljj_j,Ujj|9  =  »)  =  -  4 

and,  if  i  ^  j, 

cov(D2pD2jl®  =  ^)  =  n  ’  ^°^(®2i-l’'^2j-ll®  =  ^  * 

Thus,  using  (20), 


(20) 
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1  [  2K 

Var(Oj,|e  -  «)  =  --^  I,  ,  Var(Bje  -  0)  ^  I  coy{!)j,ll^  |e  =  i) 

v2K)  L  1-1  l<i^j<2K 

Further,  E[D2j|e  =  ^]  =  ^/2.  Thus  P[|U2j  -  ||  >  f  |©  =  ^  <  Var(U2j)/c^  0  as 
K  -♦  ®.  Hence,  for  each  tf,  given  0  =  tf, 

®2K  ~  I  ■*  ° 

in  probability  as  K  -«  ®.  A  similar  analysis  holds  for  V2K-I  1  <  ^  <  2. 

Thus  (19)  does  hold  in  what  is  clearly  an  essentially  two  dimensional  sequence  of 
tests.  However,  it  is  intuitively  clear  that  (18)  fails  since  the  even  items  can  be 
averaged  to  consistently  estimate  and  odd  items  to  consistently  estimate  $2>  □ 

Remark .  Example  3.2  illustrates  that  will  estimate  a  mixture  of  the 
essential  dimensions  when  they  exist  in  asymptotically  fixed  proportions  as  N 
increases. 

Now  Theorem  3.2  can  be  stated  and  proved. 

Theorem  3.2.  Let  {ll^,i  >  1}  be  essentially  unidimensional  (dg  =  1)  with  respect 
to  ability  0.  Then,  6  may  be  consistently  estimated.  In  particular,  for  each 
given  Q  =  $,  (19)  holds. 

Conversely,  if  for  some  VI  latent  model  N  >  1}  the  unidimensional  9 

may  be  consistently  estimated  then  {Uj^,0,N  >  1}  is  an  El,  VI  representation  and 
hence  dg  =  1  holds. 

Proof .  Assume  dg  =  1  with  repect  to  9.  Let  {Ajj}  =  {(D^  ,...,U^  ^  ^)}  be  a 
collection  of  nonsparse  subtests.  Fix  c  >  0  and  9.  Then 


j  j(ii)  1  j(i() 

'luty  i:,,  'ij  -  i(KT  I",!  'ij 


(0)1  >  e|0  =  0  <  Var 


N 

U  |9=«a2  (21) 

,  J  J  » 


1  1 

because  E[j^  2.  ®i.  I®  =  ^  ~  I(TTy  Z.  “o^ing  by  nonsparseness 

J — ^  J  J  ^  J 


that  N/i(N)  <  C  <  ®  for  all  N  and  that 
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i  rJ>(N) 


it  follows  that 


0  <  Var  je  =  d  =  Var(I).ie  =  6)  +  D^(0)  < 

^  +  Dj^(^)  -»  0  (23) 

by  the  assuaption  of  essential  unidiaensionality.  Coabining  with  (21)  then 
establishes  test  consistency,  as  desired. 

Conversely  suppose  6  aay  be  consistently  estiaated.  Fix  $  and  a  collection 
of  nonsparse  subtests  Then,  by  (23) 

»!((»)  >  -  K  •  (24) 

Thus,  Djy(0)  cannot  have  any  negative  liait  points. 

For  any  bounded  randoa  variable  1,  denoting  the  bound  by  a  (i.e.,  |X|  <  a) 

the  vell-4novn  converse  to  Chebychev’s  inequality  states  that 


P[|I|  >  e]  > 


Ed^)  - 


-■AlCvC'V'l 

Then  noting  that  |I||f|  <  1, 

P[|Ij,|  >  c(e  =  ^  >  E(Ij,|^|e  =  6)  ~  >  Var(Ij,|e  =  9)  -  ?. 

By  consistency,  0  in  probability  as  N  -•  od.  Thus, 

P[|^ji(l  ^  ^  0  N  -•  ®. 

Thus,  using  (23)  and  (26),  for  each  e  >  0, 


Vardljie  =  t)  .  - 


has  no  positive  liait  points.  But,  by  (22)  and  the  fact  that  l(N)  -»  o  it  then 


follows  that  D^(^)  has  no  positive  liait  points.  Since  it  was  also  concluded  that 
D|||(^)  has  no  negative  liait  points,  D|||(9)  -•0  as  N  -•  o.  Thus  d£  =  1  has  been 
established,  as  desired.  □ 


Eeaark.  It  is  interesting  to  note  that  Theorem  3.2  guarantees  the  consistent 

estimation  of  ability  even  if  the  I&Fs  are  unknown  to  the  practitioner.  That  is,  use 

_N  _N 

of  Gjj  to  estimate  ^  does  not  require  knowledge  of  ^ 

as  no  attempt  is  being  made  to  establish  a  standardized  ability  scale  across  tests 
(e.g.,  as  a  precurser  to  equating  tests)  knowledge  of  the  IRFs  is  not  required. 

It  is  a  foundationally  relevant  fact  that  essential  unidimensionality  implies 
under  a  mild  and  natural  regularity  condition  for  {Uj^,i  >  1}  that  the  latent 
ability  is,  in  the  ordinal  sense,  unique,  as  Theorem  3.3  below  asserts. 

Definition  3.3.  Let  {0^,  i  >  1}  be  essentially  unidimensional  with  respect  to 
ability  0  and  let  R  be  the  range  of  0  (i.e.,  P[0  c  R]  =  1  with  R 
"minimal").  Suppose  for  every  in  R  that  there  exists  >  0  and  an  open 

neighborhood  of  0^  such  that  for  all  ^2  ^  range  R  of  0  that 

w  y  — I - J — ^  >  0  for  all  N. 

"  ^i=l 


(27) 


Then  {IIj|,0,N  >  1}  is  said  to  be  locally  asYmptoticallv  discriminating  (LAD)  with 
respect  to  0. 

N 

Remarks .  Vhat  LAD  really  supposes  is  that  V  P(^)/N  is  increasing  faster  than 

^i=l  ^ 

some  positive-slope  linear  function  in  some  neighborhood  of  ff  for  every  0, 
independent  of  N.  LAD  guarantees  that  on  averaee  the  items  of  the  test  {U^,i  >  1} 
are  sufficiently  discriminating  locally  with  respect  to  0.  Note  that  LAD  is  a 
strengthening  of  VI. 

Theorem  3.3.  Suppose  {V^i  >  1}  is  essentially  unidimensional  with  respect  to  both 
0  and  S'.  Let  the  corresponding  marginal  item  response  functions  be  denoted  by 


Pi(0)  =  E[lIJ0  =  0],  ?'.(0)  =  E[Ui|0'  =  0] 
for  all  0.  Suppose  {U^,i  >  1}  is  LAD  with  respect  to  0.  There  then  exists  a 
function  g  defined  on  the  range  R'  of  0'  such  that 
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^  -  S(^0>  S  nondecreasing 
and  the  range  of  g  is  E. 

ie«ark8 .  (i)  Theorea  3.3  states  that  in  the  sense  of  ordinal  scaling,  that  all 
scales  with  LAD  holding  are  the  saae.  In  this  precise  sense  the  latent  variable  is 
unique. 

(ii)  Since  a  d  =  1,  VI,  LI  aodel  is  also  an  El  aodel,  note  that  Theorem  3.3 
holds  for  d  =  1,  LAD,  LI  aodels  as  veil.  Thus  Theorea  3.3  aay  be  of  interest  even 
if  one  does  not  wish  to  use  El  in  IIT  aodeling. 

Proof  of  Theorea  3.3.  By  Theorea  3.2,  for  each  9  and  9' 

in  probability  given  Q  =  9  (and  hence  on  any  subset  of  9=9)  and 

in  probability  given  0'  =  9'  (and  hence  on  any  subset  of  0'  =  9')  where 
Aj,(9)  =  E[Dj,|0  =  9}  and  A^(9')  =  E[D^|0^  =  9']. 

Let 

S.o'  =  [e  =  n  [0' .  *'] 

for  all  9,  9' .  Then,  for  each  tf,  9*  such  that  Gg  g,  #  (28)  and  (29)  iaply  on 

G  9  0/  that 

)  ■*  (30) 

Fix  9'  €  1^  and  let,  denoting  the  empty  set  by 

V  = 

Note  that  ^  <f>  for  each  6  E  because  each  examinee  has  an  ability  value  for 
both  0  and  9'.  Suppose  9^  ^  ^2  ^  8^,,  ^2  ^  8^'  ^2  ^ 

loss  of  generality.  Then  (30)  implies  that 
Aj|(^2)  ~^j^(^j)"*0  as  N-*aD. 


That  is. 
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contradicting  (27).  Thus  consists  of  a  unique  0  e  1  for  each  6':  i.e.,  a 

function  g  is  defined: 

e=g{0')  for  all  €  R'. 

Choose  $2  >^j  with  e  R\  dg  €  R'.  Then  define 

^2  =  8(^2^ »  =  8(^i)- 

Now, 

^n(^2)  -  ^  ® 

because  the  essentially  unidiaensional  aodel  {]Ijy>0'»N  >  1}  is  VI.  By  the 

definition  of  g,  recalling  (30),  it  follows  that  Rj((^2)  ~  negative 

liait  points.  Thus  $2  >  6^  by  aonotonicity  of  i|^(0).  That  is,  g  is  aonotone 

nondecreasing  and  well  defined  for  all  6'  €  R'. 

Because  [0'  =  6']  c  [0  =  g{0')]  the  probability  space,  say  0,  satisfies 

n  =  U  (0'  =  tf')  C  U  (0  =  g(r)) 

0'a'  O'a' 

and  n  can  be  partitioned 

fl  =  U  (0  =  ^), 

^R 

it  follows  that  the  range  of  g  is  R.  □ 

Reaarks .  (i)  Note  that  Theorea  3.3  does  not  claia  that  g  is  strictly  increasing. 

That  is,  the  rescaling  given  by  g  could  assign  aany  $'  to  the  saae  9.  Because 
no  assuaption  analogous  to  (27)  was  aade  for  0',  this  is  of  course  expected.  For, 
the  0''  scale  could  produce  a  finer  partition  of  the  latent  ability  space  than 
needed  to  achieve  essential  unidiaensionality.  Thus  the  collapsing  of  distinct  9' 
into  a  single  9  by  g{9')  cannot  be  ruled  out.  For  exaaple,  if  for  the  &' 
scale  there  exists  an  interval  [a,b]  such  that, 

P^(^')  =  0  for  all  i,  9'  €  [a,b] 

then  the  0'  scale  should  be  rescaled  so  that  all  9'  €  [a,b]  should  be  collapsed 
to  a  single  point,  say  9'.  However,  assuaing  (27)  for  9'  as  well  does  iaply  a 
strictly  increasing  g. 
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(ii)  In  a  private  conaunication,  Brian  Junker  has  pointed  out  that  an  alternate 
proof  of  Theorea  3.3  can  be  given  that  produces  g  explicitly.  See  Junker  (1988) 
for  details. 

(iii)  Note  that  the  infinite  itea  pool  foraulation  was  essential  for 
establishing  the  uniqueness  of  ability  scale.  It  is  the  author’s  position  that  an 
infinite  itea  pool  foraulation  greatly  aids  the  study  of  aany  foundational  IRT 
issues.  Indeed,  that  is  a  aajor  point  of  this  research. 

(iv)  For  unidiaensional  El  and  hence  a  fortiori  for  unidiaensional  LI 
aodels  Theorea  3.3  foraalizes  the  well  known  notion  that  the  6  scale  is  not 
coapletely  pinned  down.  Indeed  unless  there  are  solid  psychoaetric  grounds  for 
preferring  one  interval  scale  over  the  rest,  the  situation  is  one  of  a  unique  ordinal 
scale  with  the  choice  of  a  convenient  interval  scale  left  up  to  the  practitioner  to 
be  decided  on  pragaatic  grounds. 

It  is  an  often  quoted  axioa  of  psychoaetrics  that  a  "test”  should  be 
unidiaensional.  If  not,  it  should  be  broken  up  into  a  battery  of  unidiaensional 
subtests,  each  to  be  analyzed  separately.  Thus,  in  the  context  of  this  paper,  the 
axioa  becoaes  that  a  test  should  be  essentially  unidiaensional. 

Theorea  3.2  has  an  interesting  aultidiaensional  analogue.  Denote  the 
aultidiaensional  IRFs  by  {P^(i),  1  >  1}. 

Theorea  3.4.  Suppose  essential  dg  diaensionality  with  respect  to  ability  for 

i  >  !}•  Then,  i  is  able  to  be  consistently  estiaated  in  probability  in  the 
sense  of  (18)  with  9  replaced  by  j?  in  (18). 

Suppose  that  the  essential  diaensionality  exceeds  d  for  i  >  1}.  Then 
there  does  not  exist  a  d  diaensional  such  that  {II|y,£,N  >1}  is  VI  and  for 
each  collection  of  nonsparse  subtests  for  all  i,  given  Q  =  £ 

j  j  j  *  ^  j 

in  probability  as  N  -•  o. 

Proof .  Analogous  to  that  of  Theorea  3.2  and  oaitted.  □ 
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leaark.  Assuae  essential  diaensionality  with  respect  to  i  for  some  d£  >  1. 
Then  according  to  Theorem  3.4  every  collection  of  nonsparse  subtests  estimates  a 
unidimensional  latent  scale  in  the  sense  that  for  each  i  given  Q  =  £,  (31)  holds. 
Let  g^  denote  the  latent  scale  (i)/I(N)  of  (31).  Consider  two  different 

collections  of  nonsparse  subtests  resulting  in  two  different  choices  of  g^^ 

Then  the  two  different  gs  may  be  ranking  examinees  on  the  basis  of  two  (or  more) 
totally  different  abilities,  which  is  unacceptible  from  the  viewpoint  of  requiring 
test  consistency.  For,  consistency  requires  that  all  of  the  gs  indeed  rank 
examinees  on  the  basis  of  the  same  unidiaensional  latent  ability.  Thus  the 
unidimensionality  of  each  {g^  }((i))N  >  1}  in  (31)  is  useless  because  it  masks  an 
inherent  and  unacceptable  (from  the  practical  viewpoint)  multidimensionality. 

Essential  unidimensionality  for  the  essential  trait  clearly  guarantees  the 
consistent  estimation  of  the  unique  latent  trait  &  by  the  results  of  Section  3.1. 
The  practitioner  who  wishes  to  estimate  $  then  needs  to  assess  for  a  particular 
test  1L||  administered  to  a  particular  population  whether  it  is  reasonable  to  adopt 
an  essentially  unidimensional  model.  The  author  (Stout,  1987)  has  developed  a 
statistical  procedure  specifically  designed  to  assess  whether  essential 
unidimensionality  holds  or  not.  This  procedure  is  based  on  a  test  statistic  that 
basically  is  large  or  small  according  as  9j|(’)  of  (7)  is  large  or  small.  Thus  the 
issue  of  essential  unidimensionality  is  an  empirically  verifiable  one.  It  is  the 
position  of  this  paper  to  recommend  that  the  usually  untested  assumption  of  a  LI, 
d  =  1,  1  ItT  model  be  replaced  by  testing  whether  the  more  realistic  d^  =  1  liT 
modeling  approach  fits  the  data  well  or  not. 

3.2.  Balanced  linear  empirical  scaling.  Sections  2  and  3.1  combine  to  produce  a 
theory  that  applies  to  scoring  examinees  using  proportion  correct  over  all 
collections  of  nonsparse  subtests.  It  seems  useful  to  generalize  this  to  a  theory 
that  applies  to  reasonable  linear  formula  scoring  schemes  instead  of  just 
proportion  correct.  Ve  can  do  so  by  a  minor  modification  of  the  concepts  of 
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essential  independence,  essential  unidimensionality,  and  consistency.  In  each  case 
the  term  "strong"  will  be  used  to  distinguish  these  concepts  as  defined  here  in 
Section  3.2  from  Sections  2  and  3.1. 

Definition  3.4.  (i)  i  triangular  array  of  item  coefficients  ^  ^  ^  ^ 

N,  N  >  1}  is  said  to  be  balanced  if  there  exists  C  >  0  such  that 

®  -  ^Ni  -  T 

for  all  i  and  N. 

(ii)  For  a  given  infinite  item  pool  {U^,  i  >  1},  the  linear  formula  scoring 
sequence  *Ni^i*  ^  called  a  balanced  empirical  ability  scaling 

provided  is  balanced. 

Remarks .  (i)  Definition  3.4  needs  interpretation.  First,  given  any  triangular 

coefficient  array  Zi=l  *Ni®i  specifies  an  ability  scale  for  the  test  ^  in 

the  sense  that  it  ranks  examinees.  This  ability  scale  is  a  latent  ability  scale: 
it  scales  examinees  entirely  on  the  basis  of  their  item  response  patterns  and  hence 
is  an  empirical  (manifest)  ability  scale.  Assumption  (32)  guarantees  that  the 
scaling  is  reasonable  in  that  (a)  each  correct  answer  can  only  increase  examinee  rank 
and  (b)  no  single  item  is  allowed  to  dominate  the  scaling.  Intuitively,  as  made 
precise  below,  large  N  can  be  thought  of  as  an  empirical  scale  that 

approximates  some  unidimensional  latent  scale. 

(ii)  Several  special  cases  of  linear  formula  scores  are  balanced.  First 

^i,N  ~  i  1  N,  yielding  {ITjj,  N  >  1},  is  clearly  balanced.  That  is, 

proportion  c'^rfect  is  balanced.  Second  it  is  easy  to  modify  the  proportion  correct 

empirical  scaling  by  using  only  items  from  nonsparse  subtests  to  form  the  proportion 
correct  scaling.  To  motivate  the  third  scoring  scheme,  suppose  a  two  parameter 
logistic  model  for  {U^,  i  >  1}  with  discrimination  parameters  a^  satisfying 

0  <  e  <  a^  <  I  for  all  i. 

Then,  the  normalized  sufficient  statistic 
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«  ,  a.B. 
1=1  11 


r 

i=l 


I 


is  clearly  balanced  with  jj  =  a^. 

(iii)  In  addition  to  the  special  cases  discussed  in  (ii)t  Junker  (1988)  has  shown 
that  the  theory  of  balanced  linear  scores  plays  a  central  role  in  his  establishing 
the  robustness  result  that  the  traditional  naxiauB  likelihood  estinator  Q  of  & 
is  consistent  (in  the  statistical  sense)  for  0  even  when  only  El  is  known  to  hold 
rather  than  LI. 

Definition  3.5.  Strong  essential  independence  holds  if  for  every  balanced 
{ajji),  for  each  given  Q  =  i, 


I  ^Ni^Nj  -  0)  -*  0.  (33) 

teBark.  (33)  is  easily  seen  to  be  equivalent  to 

^Ni^il^  =  i]  -♦  0  as  N  «  (34) 

for  each  0.  Note  that  (7)  iaplies  strong  essential  independence,  which  in  turn 
iBplies  essential  independence  as  defined  by  (5).  Recall  that  (7)  and  (5)  can  be 
considered  the  saae  for  practical  purposes.  Hence  (7),  (5),  and  strong  essential 
independence  can  be  thought  of  as  the  saae  for  practical  purposes.  It  follows  that 
(33)  and  hence  (34)  can  be  thought  of  as  characterizing  essential  independence  with 
respect  to  0. 

Vhen  one  views  essential  independence  as  (33)  and  hence  as  (34)  holding,  then 
for  each  balanced  by  Chebychev’s  well-Rnown  probability  inequality. 


in  probability  as  N  -•  od  for  soBe  latent  scaling  gj|(^.  Thus  balanced  eBpirical 
scalings  dft,  as  suggested  above,  approxisate  soBe  latent  scale.  Thus  (33)  enphasizes 
the  ability  of  any  adaissible  eapirical  scaling  to  "recover"  a  latent  scale.  This 
will  be  expanded  upon  in  Section  3.3. 

Let  A  5  denote  an  arbitrary  balanced  sequence.  Define  the  notation 

g^  j|(0)  for  a  unidiaensional  0  by 
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(35) 


Definition  3.6.  It  is  said  that  0  may  be  strongly  consistently  estimated  if  for 

N 

a-.  0.,  N  >  1},  for  each  6,  given  0  = 
i=l  ^ 

*Ki  'i  -  *!,«(*)  -  0  (36) 

in  probability  as  N  -»  od. 

ie»ark.  Clearly  strong  consistency  implies  consistency  as  defined  in  Section  3.1. 

It  is  not  knovn  whether  consistency  can  hold  and  strong  consistency  not  hold. 

leplacing  collections  of  nonsparse  subtests  by  balanced  linear  formula  scores 
yields  a  slightly  modified  version  of  Theorem  3.2,  which  is  stated  below.  First  we 
define  strong  essential  dimensionality. 

Definition  3.7.  (i)  in  item  pool  is  said  to  be  weakly  monotone  for  all  balanced 

sequences  (¥IB)  with  respect  to  fi.  if 

Ii=l 

is  monotone  for  all  N  and  for  every  balanced 

(ii)  The  strong  essential  dimensionality  dg  of  a  an  item  pool  {D^  i  >  1}  is  the 
minimal  dimensionality  required  for  a  latent  trait  3  to  make  the  latent  model 
{£}))  N  >  1}  a  strongly  essentially  independent  VIB  model.  Vhen  dg  =  1  in 
the  strong  sense,  strong  essential  unidimensionalitv  is  said  to  hold. 

Theorem  3.5.  Let  {U^,  i  >  1}  be  strongly  essentially  unidimensional  with  respect 
to  ability  0.  Then  9  may  be  strongly  consistently  estimated.  In  particular,  for 
each  given  9=0,  (18)  and  (19)  hold. 

Conversely,  if  for  some  monotone  latent  model  {Hjj,  0,  N  >  1}  the 
unideminsional  9  may  be  strongly  consistently  estimated,  then  0,  N  >  1}  is 

a  strongly  essentially  independent,  V16  model  and  hence  strong  essential 
unidimensionality  holds. 

Proof .  Analogous  to  that  of  Theorem  3.2.  Omitted.  □ 


I 

every  balanced  empirical  scaling 

I 
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[.  The  use  of  {U«} 


or  sore 


« 

generally  of  a  balanced  empirical  scale  a^  as  a  sequence  of 


estimators  of  $  in  the  ordinal  scaling  case  vas  developed  in  Sections  3.1  and  3.2. 
Estimation  in  the  ordinal  scaling  sense  is  inappropriate  when  it  is  either  desirable 
or  required  that  the  6  scale  be  used  for  the  ability  of  interest.  In  many  such 
applications,  the  items  (or  at  least  a  common  core  of  them)  have  been  calibrated 
relative  to  the  constructed  standardized  ability  scale  $.  Estimation  of  0  with 
known  IKFs  has  been  widely  treated  in  the  literature  (see  for  example  Hambleton  and 
Swaminathan,  1985,  Section  5.3).  Maximum  likelihood  estimation  (ILE)  is  one  method 
of  choice  in  this  setting.  The  HE  0  congerges  in  probability  to  6  under 
suitable  regularity  conditions  in  the  sense  that,  given  &=  0,  9  0  in 

probability  as  the  number  of  items  N  -»  o.  Only  rarely  however,  is  it  possible  to 
provide  a  simple  formula  for  the  ILE  as  a  function  of  The  MLE  is  usually  a 
highly  non-linear  function  of  ]Ij^.  Thus  in  the  case  of  known  lEFs  it  seems 
desirable  to  seek  alternatives  to  ILE  that  are  based  on  linear  formula  scoring  and 
for  which  simple  formulae  are  available.  Ve  now  propose  a  family  of  such  estimators, 
using  the  results  of  Sections  3.1. 


=  1*^1  ''it'O/ll,  i,(«)  =  ajiPiCO) 


^Ni  ®i- 

Kecall  from  Theorem  3.2  that  when  {9^,  i  >  1}  is  essentially  unidimensional 
with  respect  to  0  then  for  each  given  0  =  ^, 

in  probability  and  !f  w.  This  suggests  estimating  0  by  {ijJ^^(Uji^)}  and  also 
suggests  for  each  given  9=0  that 
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in  probability  and  N  o  should  hold,  loreover,  recalling  Theorem  3.5  this  result 
should  generalize  to  balanced  scoring  with,  for  each  6=0, 

in  probability  and  N  -•  a.  Theorem  3.6  belov  states  that  this  is  true  provided  a 
slightly  modified  local  asymptotic  discrimination  holds.  Definition  3.8  is  the 
appropriate  analogue  of  Definition  3.3. 

Definition  3.8.  let  an  infinite  item  pool  be  essentially  unidimensional  with  respect 


to  ability  6.  Let 


■i. 


a^iPi(^)  be  formed  from  a  balanced  sequence 


{ajj^}.  Suppose  for  every  fixed  6^  such  that  0^  is  in  the  range  R  of  6  that 
there  exists  >  0  and  an  open  neighborhood  1^  of  0^  such  that  for  all  $2  € 


and  in  the  range  R  of  6 


that 


t 


Pi«»2) 


-L 


‘iff 


-  % 


for  all  N. 


(37) 


Then  {]i|,  e,  i„(^,  H  >  1}  Is  said  to  be  locally  asTMtoticallv  dlscrUinatint 
(LAD)  with  respect  to  0  and 

Usually  Ajy(0)  is  continuous  in  applications,  thus  making  its  inverse  well 

defined  over  its  range.  However,  in  order  to  have  a  theory  that  allows  for 

discontinuities,  the  following  definition  of  AjjJ^^(u)  will  be  used 

Ajf^(u)  5  inf  {0  :  L{0)  >  u}. 

^€R  ” 


Here  R  denotes  the  range  of  0.  Note  that  ijj(P)  =  -®  or  od  is  possible;  e.g., 

if  u  =  1/5  and  Ajj(P)  >  1/4  for  all  0, 

Theorem  3.6.  Let  {£j^,  0,  N  >  1}  be  strongly  essentially  unidimensional  with 

_N 

respect  to  0.  Suppose  l^{0)  =  2.  a^^^  is  formed  from  a  balanced  ability 

scaling  a^^^  D^.  Suppose  {Hj^,  0,  Ajj(^),N  >  1}  is  LAD  with  respect  to 

0  and  {ajy^}.  Then,  for  each  given  0  =  P, 
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-  «  (38) 

in  probability  as  N  -*  o. 

Proof.  Fix  0.  By  Theorea  3.5,  given  Q  =  0, 

in  probability  as  N  -•  od.  It  is  an  elementary  lenmta  of  probaoility  theory  that  -• 
I  in  probability  as  N  -»  «  if  and  only  if  each  subsequence  contains  a 

further  subsequence  ^  probability  one  as  k  -»  to.  Thus  to  prove  the 

theorea,  it  suffices  to  select  an  arbitrary  subsequence  {N(j)}  and  then  prove  there 
exists  a  further  subsequence  N(j(k))  such  that 

with  probability  one  as  k  -*  a.  Choose  {N(j)}.  Then  (39)  implies  that 
®N(j)  “  ® 

in  probability  as  j  -«  o.  Then,  using  the  above  mentioned  lemma,  there  exists  a 
further  subsequence  N(j(k))  for  which 

with  probability  one.  By  (37)  and  the  definition  of  the  inverse,  for  all  {9) 

sufficiently  small  in  magnitude  and  satisfying  /i2  ^  there  exists 

<  o  such  that 

-«l  <««l«2-i|,(0)l  for  all  6.  (41) 

Fix  a  typical  point  in  the  probability  space.  Nov,  it  may  be  that  for  some 
arbitrarily  large  k 

®N(j(k))  -  ^N(j(k))(^^’ 

By  (40)  and  LAD,  there  exists  -*  0  such  that  for  all  large  k 

''ii(j(k))  >  *ii(j(i))(«- ‘k)- 

Thus  ®g(j(j5))  >  ^N(j(k))^^^  sufficiently  large  k  using  LAD.  Hence 

(42)  cannot  hold  for  arbitrarily  large  k  and  thus  (41)  can  be  applied  with 

=  '»(j(k))- 

Thus,  combining  (40)  with  (41), 
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with  probability  one,  as  required.  □ 

ieaarks .  (i)  Theoren  3.6  provides  a  large  class  of  sequences  of  estiaators  of  9, 

including  {A^^(0||f)},  based  on  linear  foraula  scoring.  In  practice,  one  needs  to 
coapute  &nd  its  inverse  l^^{9)  to  aake  use  of  one  of  these  estiaators. 

(ii)  It  is  to  be  noted  that  Holland,  Junker,  and  Thayer  (1987)  have  proposed 

using  to  estiaate  the  distribution  of  6  and  have  proved  a  convergence 

in  distribution  result  to  justify  this.  Their  aotivation  for  suggesting 

is  different  froa  ours. 

(iii)  It  is  eleaentary  to  show  that  (38)  holding  for  all  9  implies 

in  probability  as  N  -*  m.  Given  the  IHT  context,  (38)  is  perhaps  a  aore 
interesting  foraulation  than  (43).  It  does  of  course  follow  froa  (43)  that  ^ilf^(V^) 
can  be  used  as  a  aethod  of  estiaating  the  distribution  of  6. 

(iv)  Note  that  (38)  states  that  convergence  in  probability  to  individual 
ability  holds  regardless  of  which  of  a  large  class  of  estiaators  is  used.  That  is, 
convergence  in  probability  to  individual  ability  holds  for  every  balanced  scaling. 
Theorea  3.5  and  3.6  show  how  close  the  traditional  statistical  notion  of  'onsistency 
and  our  psychoaetric  notion  of  consistency  really  are.  By  Theorea  3.5,  strong 
consistency  is  equivalent  to  strong  essential  unidiaensionality,  which  by  Theorea  3.6 
iaplies  that  a  wide  class  of  natural  estiaates  is  consistent  in  the  ordinary 
statistical  sense. 

(v)  A  version  of  Theorea  3.6  is  possible  that  only  deals  with  Ujj: 

Theorea  3.6'.  Let  >  1}  be  El  and  LAD  with  respect  to  0.  Then, 

for  each  gevin  6=0, 

in  probability  as  N  -»  oo. 
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4.  A  Stochastic  Model  for  the  Construction  of  Essentially  Pnidimensional  Tests. 

In  Section  2  and  3,  the  case  has  been  made  for  using  essentially  unidiaensional 
I&T  Bodels  both  in  applications  to  real  test  data  and  in  the  investigation  of 
theoretical  issues.  This  approach  leads  to  the  uniqueness  of  the  unidimensional 
latent  ability  scale  and  the  consistent  estimation  of  latent  ability  both  in  the 
ordinal  and  the  fixed  scaling  case.  This  new  modeling  approach  requires  the 
replacement  of  modeling  a  fixed  finite  length  test  ]Ij|  by  modeling  an  infinite  item 
pool  {D^,i  >  1}.  {Cj^ji  >  1}  is  the  test  that  would  result  were  one  to  continue 
constructing  items  {IIj^,i  >  N},  "in  the  same  manner"  as  ^  i  1  N}  was 

constructed.  It  has  been  stressed  throughout  that  essential  unidimensionality  is  an 
empirically  testable  property,  using  Stout’s  (1987)  statistical  test  of 
unidimensionality. 

Now  it  seems  appropriate  to  present  and  study  a  model  showing  a  plausible  way  in 
which  essentially  unidimensional  infinite  item  pools  {U^,i  >  1}  can  be  constructed. 
The  actual  observed  test  S||f  is  then  obtained  by  terminating  the  process  of 
constructing  items  after  N  items  have  been  obtained. 

Ve  assume  6  is  the  ability  to  be  measured  and  that  the  items  also  depend  on 
finitely  or  infinitely  many  other  abilities  (6p62>>>>)  and  that  the  resulting 
ability  space  Q  =  (0,0^,02, . . •)  is  "complete"  in  the  sense  that  fi  explains  the 
variation  between  individuals  in  item/test  performance.  That  is,  we  assume  that 
{U^,i  >  1}  has  a  VI,  LI  IRT  model  {lIj(,S,N  >  1}  where  Q  =  (0,0j,...).  This 
assumption  is  neither  mathematically  nor  psychometrically  restrictive. 

Assume  thoughout  Section  4,  consistent  with  Theorem  3.1,  that  (£j(,0,N  >  1)  is 
an  essentially  unidimensional  VI  model.  It  is  assumed  that  the  representation  for 
Q.  is  orthogonal  in  the  sense  that  all  0^  0^,  pairs  are  independent  given  0. 
This  assumption  basically  amounts  to  choosing  an  orthogonal  coordinate  system  for  the 


latent  ability  space  and  hence  is  not  unduly  restrictive. 
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It  is  assuaed  that  all  items  are  inherently  multiply  determined,  as  motivated  by 
discussion  in  Section  2.  That  is,  each  item  can  depend  on  one  or  more  of  the  other 
abilities  0^  besides  0.  Consider  the  construction  of  the  ith  item,  let 
p..  =  P[Item  i  depends  on  0.]. 

*  J  •! 

The  assumption  of  multiply  determined  items  then  translates  into  the  assumption  that 
for  each  i 

p^j  >  0  for  some  (possibly  many)  j  can  hold. 

Implicit  in  the  introduction  of  the  p--s  is  the  assumption  that  the 
determination  of  which  abilities  in  addition  to  0  influence  each  item  can  be 
viewed  as  a  random  process.  (This  is  not  the  same  as  saying  it  is  inherently  a 
random  process  like  radioactive  decay  —  for  example  the  digits  of  x  can  from  the 
statistical  perspective  be  well-modeled  by  a  random  process).  That  is,  the  "context" 
of  each  item  can  be  viewed  as  randomly  determined. 

There  are  deliberately  no  model  assumptions  made  here  about  other 
characteristics  of  the  items  such  as  discrimination,  difficulty,  guessing,  etc. 

Also,  no  assumptions  are  made  about  the  amount  of  item  dependencies  on  various 
dimensions  0^  although  such  a  refinement  is  possible  and  could  be  helpful. 

Because  in  a  typical  aptitude  or  achievement  test,  different  items  are  often  written 
by  different  individuals  and  item  selection  is  controlled  by  factors  such  as 
discrimination,  difficulty,  congruence  with  intended  content  domain  0,  etc.,  the 
assumption  of  random  item  context  together  with  no  assumptions  about  item 
characteristics  seems  appropriate,  even  if  there  is  no  explicit  random  mechanism  for 
choosing  item  context. 

Veak  and  natural  restrictions  placed  on  the  magnitudes  of  the  {p^^}  suffice  to 
guarantee  that  essential  unidimensionality  holds;  as  is  now  established.  Let 
=  Number  of  abilities  besides  9  influencing  item  i 
and 

N^  =  Number  of  the  first  N  items  dependent  on  9y 
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Theorea  4.1.  Suppose  for  {U^,i  >  1}  that 

E(N.)  <  I  <  a,  for  all  i  (44) 

and  that  for  all  j, 


«(»«.) 

— ]p—  ^  -♦  0  as  N  -*  «  (45) 

Suppose  independence  of  the  "assignaent"  of  abilities  to  itea  pairs  in  the  sense  that 
for  all  j 

P [Ability  0^  assigned  to  itea  i  and  to  itea  i']  =  (46) 

J  ^  J 

Then,  essential  unidiaensionality  with  respect  to  6  holds  in  the  precise  sense 
that  for  each  d,  Dj^(tf)  of  (7)  satisfies  Djj(^)  -»  0  in  probability  as  N  -*  «. 
Aeaark.  Here,  "in  probability"  refers  to  the  randoa  process  partially  specified  by 
the  p^jg  that  deteraines  which  ^jS  influence  which  iteas.  Thus  according  to 
Theorea  4.1,  for  fixed  large  N,  "aost"  of  the  infinite  itea  pools  constructed  will 


produce  a  "saall"  Djj(^). 

Proof .  According  to  the  proof  of  Theorea  2.2,  cov(UplIj^, |0  =  ff)  =  0  for  all  0 
unless  Up  U^/  depend  on  at  least  one  coaaon  0p  Thus,  using  (46) 

J 

E[|cov(UpU^, |0  =  tf)|]  <  P[UpU^,  depend  on  soae  0.]  <  Y  PiiPi'i’ 

J  J  s  **  * 


Note  that 


E(«i)  =  Py,  E(l.,,)  .  py 


Thus, 


^  PijPr j 

-  iqN  -  i)  ^ 

as  N  -♦  OD.  But  >  0,  EIjj  -♦  0  iaplies  Ajj  0  in  probability  holds  for  arbitrary 
randoa  variables  □ 
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Suppose  the  designers  of  the  test  are  deliberately  being  careful  not  to  let  too 
■any  pairs  of  iteas  depend  on  any  one  6-^  in  an  effort  to  create  contextual 

«l 

balance.  Then  clearly  assuaption  (46)  is  inappropriate  and  should  be  replaced,  in 
appropriate  assuaption  is  for  all  i,  i',  j  that 

P[Ability  0.  assigned  to  Itea  i  and  Itea  i']  <  (47) 

Clearly  the  above  proof  is  valid  if  (47)  replaces  (46).  This  yields: 

Corollary  4.1.  Suppose  for  {D^,i  >  1}  that  (44),  (45),  and  (47)  hold.  Then 
essential  unidiaensionality  with  respect  to  6  holds  in  the  sense  stated  in 
Theorea  4.1  a 

There  is  a  deterainistic  version  of  Corollary  4.1  that  aakes  no  assuaptions 
about  randoaness  and  generalizes  Theorea  2.2. 

Corollary  4.2.  Suppose  for  {(Ij,,i  >  1}  that  the  dependence  of  abilities  on  iteas  is 
such  that 


<  I  <  »  for  all  i  (48) 

and  that  for  all  j 

^  Cjj  “*  0  as  N  -♦  oD.  (49) 

Then  essential  unidiaensionality  with  respect  to  B  holds. 

Proof .  cov(C^,U^/ 10  =  tf)  =  0  unless  depend  on  at  least  one  coaaon  0. . 

Thus  a  siaple  counting  arguaent  yields 


2 

“  1)  NKNfj^  -♦  0  as  N  -*  ».  □ 

Eeaark.  The  aild  hypotheses  of  Corollaries  4.1  and  4.2  suggest  one  strategy  for 
essentially  unidiaensional  test  construction  in  the  face  of  aultiply  deterained 
iteas:  Keep  the  nuaber  of  abilities  per  itea  as  low  as  possible  (see  (44)  or  (48)); 
keep  the  nuaber  of  iteas  influenced  by  any  one  ability  other  than  6  as  low  as 
possible  (see  (45)  or  (49);  and,  subject  to  these  constraints,  keep  the  nuaber  of 
itea  pairs  assigned  to  each  ability  other  than  0  as  low  as  possible  (see  (46)  or 
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(47)).  This  last  constraint  can  either  be  accomplished  by  a  random  or  pseudorandom 
assignment  of  abilities  to  items  so  that  (46)  tends  to  hold,  or  a  deliberate  effort 
can  be  made  to  balance,  in  an  experimental  design  sense,  the  assignment  of  minor 
abilities  to  various  items  in  the  sense  that  (47)  tends  to  hold. 

There  is  an  unavoidable  paradoxical  aspect  to  the  model  of  this  section  and 
indeed  to  the  infinite  item  pool  model  of  this  paper.  In  order  to  arrive  at  a 
rigorous  and  useful  conception  of  essential  unidimensionality  it  was  necessary  to 
replace  modeling  of  the  observable  finite  length  test  jlj|  by  the  unobservable 
(except  for  its  initial  segment  Q.^^)  infinite  length  pool  {IIj^,i  >  1}.  But  in  order 
to  apply  this  new  modeling  framework,  one  must  assess  the  degree  to  which  essential 
unidimensionality  well-models  the  observable  test  rather  than  the  unobservable 
item  pool  {U^i  >1).  To  address  this  paradox,  recall  that  essential 
unidimensionality  holds  provided  for  all  9 
Djj(tf)  ^0  as  N  “♦  «. 

Thus,  the  practitioner  needs  to  assess,  based  on  actual  test  data  for  whether, 
for  the  actual  test  length  N,  for  all  9 

»  0  (50) 

A  close  examination  of  the  statistical  test  for  essential  unidimensionality  in  Stout 
(1987)  —  see  Section  5  and  Formula  (17)  of  that  paper  in  particular  —  shows  that 
the  test  is  designed  precisely  to  assess  the  degree  to  which  (50)  holds  for  nonsparse 
subtests.  The  lonte  Carlo  simulations  presented  in  that  paper  justify  using  the 
statistical  test  of  essential  unidimensionality  for  ability  tests  as  short  as  25 
items  with  as  few  as  750  examinees. 

The  model  of  this  section  with  its  large  number  of  parameters  {P:^}  is  intended 
for  conceptual  purposes  and  is  not  intended  to  facilitate  analyses  of  real  test  data. 
However  consequences  of  the  model  such  as  Theorem  4.1,  Corollary  4.1,  and  Corollary 
4.2  may  be  useful,  as  remarked,  as  guides  to  good  unidimensional  test  construction. 


The  purpose  of  the  paper  is  to  present  a  nev  IRT  aodeling  approach  based  on  the 
eabedding  of  the  test  into  an  infinite  item  pool  {U^^i  >  1}  and  then  to  show 
the  usefulness  of  this  approach  to  certain  fundamental  test  measurement  topics  such 
as  dimensionality  and  ability  estimation.  The  paper  provides  a  new  conceptualization 
of  latent  dimensionality,  essential  dimensionality.  This  conceptualization  depends 
on  the  replacement  of  local  independence  by  the  weaker  and,  in  our  opinion, 
psychometrically  more  appropriate  notion  of  essential  independence.  Essential 
dimensionality,  designed  to  dovetail  with  the.  empirical  reality  of  multiply 
determined  items,  attempts  to  count  only  the  dominant  dimensions.  Theorems  2. 1-2.3 
present  conditions  that  guarantee  that  essential  unidimensionality  holds.  In 
particular,  dimensions  distributed  nondensely  over  items  or  dimensions  having  a  minor 
influence  on  possibly  many  items  do  not  negate  essential  unidimensionality. 

In  Section  3.1,  essential  unidimensionality  is  shown  in  Theorem  3.2  to 
characterize  the  consistent  estimation  of  a  unidimensional  ability  in  the  ordinal 
scaling  case:  The  ordinal  scaling  case  holds  when  any  monotone  transformation  of  the 
given  ability  scale  is  an  acceptable  choice  for  the  ability  scale  to  be  used.  The 
consistent  estimation  of  ability  is  precisely  defined  in  Definition  3.2  and  a  slight 
variant  in  Definition  3.6.  Roughly,  a  test  {U^,i  >  1}  consistently  estimates 
ability  if  all  reasonable-to-use  linear  formula  scores  asymptotically  estimate 
different  monotone  transformations  of  the  same  unidimensional  latent  ability. 
"Reasonable-to-use"  is  formalized  by  examining  collections  of  nonsnarse  subtests  and 
balanced  linear  empirical  scalings.  In  order  to  facilitate  this  development,  the 
concepts  of  marginal  item  response  function  and  intrinsic  ability  scale  are 
presented.  The  estimation  of  ability  in  the  ordinal  scaling  sense  does  not  require 
the  IRFs  to  be  known  (i.e.,  calibrated). 

Theorem  3.3  shows  that  essential  unidimensionality  guarantees,  under  the  mild 


regularity  condition  of  local  asymptotic  discrimination  of  {Dj,i  >  1},  that  the 
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latent  ability  is  unique  up  to  a  aonotone  transformation,  that  is,  in  the  ordinal 
scaling  sense. 

Section  3.2  extends  the  theory  from  collections  of  nonsparse  subtests  to 
balanced  linear  empirical  scalings,  thus  yielding  results  for  a  vide  and  natural 
class  of  empirical  scalings. 

Section  3.3  addresses  the  estimation  of  ability  6  on  the  specified  9  scale 
when  lEFs  are  assumed  known.  Theorem  3.6  presents  a  large  class  of  estimators  of 
9  that  consistently  estimate  9  on  the  9  scale  in  the  sense  that  each  such 
estimator  satisfies  for  each  9^  given  9  =  9 

in  probability  as  N  -»  od.  TIis  includes  in  particular  the  estimator  Each 

such  estimator  is  computable,  has  a  simple  formula,  and  is  based  on  an  admissible 
linear  formula  scoring  scheme  in  an  intuitively  natural  way. 

Section  4  presents  a  conceptual  model  for  the  construction  of  essentially 
unidimensional  tests  in  the  presence  of  the  unavoidable  empirical  reality  of 
multidimensional  items.  A  test  developers’  prescription  for  essentially 

unidimensional  test  construction  emerges:  keep  the  number  of  abilities  per  item 
small;  keep  the  number  of  items  dependent  on  the  same  ability  (other  than  the 
to-be-measured  9)  small;  and  keep  the  number  of  item  pairs  assigned  to  the  same 
ability  other  than  9  small.  It  is  stressed  in  Section  4  and  throughout  the  paper 
that  essential  unidimensionality,  while  defined  for  the  unobservable  {ll^>i  >  is 
statistically  testable  based  on  data  from  and  that  the  statistical  test  given  in 
Stout  (1987)  is  precisely  designed  to  assess  the  degree  to  which  is  well  modeled 
by  the  assumption  of  essential  unidimensionality. 
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